Tests overhaul#143
Merged
Merged
Conversation
Tighten tolerance for float64 from 1e-3 ~ 1e-4 to 1e-9. Relax tolerance for float32 to 1e-2, as increasing the number of valid examples lead to the discovery of dotv: float32 rounding error is 10x of NumPy (explosion#142). Swap actual <-> desired in assert_almost_equal, as the two are not symmetric.
Add test coverage for arrays of size 1. Remove unused, confusing default parameters from custom strategies.
This is meant to be tampered with locally for more thorough (slower) tests.
Mirrors same test in test_gemm.py.
Improve readability. Given infinite hypothesis examples, this commit does not introduce any functional changes. However, given a fixed and relatively low max_examples setting, it substantially increases test coverage: 1. It prevents examples that were previously skipped by assume(). According to pytest --hypothesis-show-statistics, these were ~10% for dotv and ~20% for gemm. 2. It prevents examples that ended up being duplicates due to trimming. For example, in dotv, hypothesis could previously generate examples e.g. A=[1,2,3,4], B=[5,6] and A=[1,2], B=[5,6,7,8]; both would get trimmed to A=[1,2], B=[5,6]. 3. In gemm, it removes an entire degree of freedom by removing unused variable out_col, which again would result in duplicate examples.
crusaderky
commented
Dec 8, 2025
| # Copyright ExplosionAI GmbH, released under BSD. | ||
| import numpy as np | ||
|
|
||
| np.random.seed(0) |
Contributor
Author
There was a problem hiding this comment.
This did nothing: https://hypothesis.readthedocs.io/en/latest/reference/strategies.html#hypothesis.strategies.random_module
Hypothesis always seeds global PRNGs before running a test, and restores the previous state afterwards.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR thoroughly revisits the test suite. It's made of several commits, each with individual comments:
dotv: float32 rounding error is 10x of NumPy #142).This is meant to be tampered with locally for more thorough (slower) tests.
Mirrors same test in test_gemm.py.
However, given a fixed and relatively low max_examples setting, it substantially increases test coverage:
pytest --hypothesis-show-statistics, these were ~10% for dotv and ~20% for gemm.A=[1,2,3,4], B=[5,6]andA=[1,2], B=[5,6,7,8]; both would get trimmed toA=[1,2], B=[5,6].out_col, which again would result in duplicate examples.Given input arrays that are shared between multiple threads, test that you can run dotv and gemm on them in parallel from multiple threads.